---
title-block-banner: true
title: "Paper RBCP 2025"
lang: en
author: 
  - id: ts
    name: Teresa Sacchet
    degrees: PhD
    email: teresa.sacchet@gmail.com
    orcid: 0000-0002-1592-2169
    affiliations:
      - ref: ufrgs
  - id: hma
    name: Hannah Maruci Aflalo
    email: hannahmaruci@gmail.com
    orcid: 0000-0002-4820-2942
    degrees: PhD
    affiliations:
      - ref: cebrap
  - id: mvca
    name: Marcus Vinícius Chevitarese Alves
    email: mchevita@gmail.com
    orcid: 0009-0002-6265-1004
    degrees: PhD
    affiliations:
      - ref: cefor
      - ref: onmp
  - id: vsc
    name: Vanilda Souza Chaves
    email: vanildachaves@gmail.com
    orcid: 0000-0002-0349-3862
    degrees: MsSc
    affiliations:
      - ref: usp
affiliations: 
  - id: ufrgs
    name: Federal University of Rio Grande do Sul (UFRGS)
  - id: cebrap
    name: Brazilian Centre for Analysis and Planning (CEBRAP) 
  - id: cefor
    name: Center for Training and Improvement - Chamber of Deputies (CEFOR)
  - id: onmp
    name: National Observatory of Women in Politics - Chamber of Deputies (ONMP)
  - id: usp
    name: University of São Paulo (USP)
    
format: 
  html:
    toc: true
    code-fold: true
    code-tools: true
    df-print: paged
    link-external-icon: true
    html-math-method: katex
editor: visual
---

```{css, echo = FALSE}
.output {
max-width: 1000px;
max-height: 500px;
overflow-x: scroll;
overflow-y: scroll;
}
```

------------------------------------------------------------------------

# Presentation

We produced this document, in Quarto notebook format as a subsidy for the analysis of data from the article to be published in the Brazilian Journal of Political Science (RBCP).

The original data were obtained from the [TSE Open Data Portal](https://dadosabertos.tse.jus.br/ "TSE Open Data Portal") and were subsequently subjected to a cleaning and unification process. This process is not entirely covered by this document, which focuses on the analysis of the data itself. However, details about the process can be obtained through the authors' emails.

# Objective and hypotheses

This study aims to conduct an exploratory analysis of the accountability data of candidates for proportional positions (federal deputy and state deputy) in the 2022 elections, comparing them to the data from the 2018 elections and making a gender and color/race cut, with a view to verifying whether there were differences in the distribution of revenues by these groups, in the context of the new electoral legal norms that took effect in the last elections.

# Environment setup

## Clear all objects

```{r}
#| warning: false
#| message: false
#| label: clear-objects

rm(list=ls())
```

## Load required packages

```{r}
#| warning: false
#| message: false
#| label: load-required-packages

library(tidyr)
library(dplyr)
library(stringr)
library(knitr)
library(DescTools)
library(ggplot2)
library(summarytools)
library(afex)
library(ggpubr)
library(broom)
library(openxlsx)
library(lubridate)
library(scales)
library(patchwork)
library(Cairo)
library(viridis)
library(flextable)
library(officer)
```

⟶ `{r} length(.packages())` libraries loaded.

# Data preparation

::: callout-note
We used three initial datasets in this work: two with campaign revenue data and one with candidate data.

The first campaign revenue dataset was aggregated by funding source; the second was aggregated by period (week).

We also generated three codebooks, one for each initial dataset. The remaining datasets used in this notebook were based on these.
:::

## Campaign revenue data preparation

### Campaign revenue data by source loading

Upload candidate income data by income source and funding source

Data generation date: 2023-09-26

```{r}
#| warning: false
#| message: false
#| label: campaign-revenue-data-loading

nome_base_receitas_fonte <- "../Dados/candidatos_2018_2022_receitas_por_fonte_2023_09_26.dsv"

# Receitas sem formatação prévia
receitas_fonte <- read.csv(
  nome_base_receitas_fonte,
  header = TRUE,
  sep = ";",                     
  fileEncoding = "utf-8",
  dec = ",",
  stringsAsFactors = FALSE)
```

⟶ `{r} nrow(receitas_fonte)` lines loaded.

### Create the codebook for *receitas_fonte* database

```{r}
#| warning: false
#| message: false
#| label: code-book-1

descriptions <- c('Election year',
                  'Position description',
                  'State acronym',
                  'Party acronyn',
                  'Candidate name',
                  'Candidate ID',
                  'Candidate sequential',
                  'Candidate gender description',
                  'Race description',
                  'Revenue source description',
                  'Resource source description',
                  'Revenue kind description',
                  'Donor ID',
                  'Campaign accountant ID',
                  'Donor candidate sequential',
                  'Donor candidate gender',
                  'Donor candidate race',
                  'Total revenue',
                  'Vote tally status',
                  'Reelection status')

code_book_1 <- data.frame(variable_name = names(receitas_fonte),
                          variable_class = sapply(receitas_fonte, class),
                          variable_desc = descriptions)

code_book_1
```

Save codebook 1

```{r}
#| warning: false
#| message: false
#| label: save-codebook-1

write.xlsx(code_book_1, 
           '..//Dados//Code_Book_1.xlsx',
           sheetName = "codebook1")
```

### Standardizing campaign revenue data by source

```{r}
#| warning: false
#| message: false
#| label: revenue-data-by-source-standardization

# prepara dados de receitas para a análise
receitas <- receitas_fonte %>%
  mutate(
    ano_eleicao = as.factor(ANO_ELEICAO),
    NR_CPF_CANDIDATO = as.character(NR_CPF_CANDIDATO),
    uf = as.factor(SG_UF),
    genero = case_when(
      str_trim(DS_GENERO) == 'FEMININO' ~ 'Women',
      str_trim(DS_GENERO) == 'MASCULINO' ~ 'Men',
      TRUE ~ 'Others'
    ),
    genero_doador = case_when(
      str_trim(DS_GENERO_DOADOR) == 'FEMININO' ~ 'Women',
      str_trim(DS_GENERO_DOADOR) == 'MASCULINO' ~ 'Men',
      TRUE ~ 'Others'
    ),
    cor_raca = case_when(
      DS_COR_RACA == 'PARDA' | DS_COR_RACA == 'PRETA' ~ 'Black',
      DS_COR_RACA == 'BRANCA' ~ 'White',
      TRUE ~ 'Other races'
    ),
    cor_raca_doador = case_when(
      DS_COR_RACA_DOADOR == 'PARDA' | DS_COR_RACA_DOADOR == 'PRETA' ~ 'Black',
      DS_COR_RACA_DOADOR == 'BRANCA' ~ 'White',
      TRUE ~ 'Other races'
    ),
    eleito = case_when(
      DS_SIT_TOT_TURNO %like% 'ELEITO%' ~ 'Elected',
      TRUE ~ 'Not elected'
    ),
    fonte_receita = str_to_upper(DS_FONTE_RECEITA),
    tipo_fonte_receita = case_when(
      fonte_receita == 'FUNDO ESPECIAL' ~ 'FEFC',
      fonte_receita == 'FUNDO PARTIDARIO' ~ 'FP',
      TRUE ~ 'Outros recursos'
    ),
    tipo_recurso = case_when(
      tipo_fonte_receita == 'Outros recursos' ~ 'Private',
      TRUE ~ 'Public'
    ),
    especie_recurso = case_when(
      DS_ESPECIE_RECEITA == 'Estimado' ~ 'Cash-equivalent',
      TRUE ~ 'Cash'
    )
  )

rm(receitas_fonte)
```

⟶ `{r} nrow(receitas)` lines standardized.

### Loading campaign resource data by time

Uploads a database of income from eligible candidates with detailed declared income by time, in 2018 and 2022

```{r}
#| warning: false
#| message: false
#| label: campaign-resource-data-by-time-loading

nome_base_receitas <- "../Dados/receitas_por_data_e_origem_2018_2022_v_2023_09_27.dsv"

# Database with raw data
receitas_por_tempo_sem_form <- read.csv(
  nome_base_receitas,
  header = TRUE,
  sep = ";",                     
  fileEncoding = "utf-8",
  dec = ",",
  stringsAsFactors = FALSE,
  colClasses=c("NR_CPF_CANDIDATO"="character"))
```

⟶ `{r} nrow(receitas_por_tempo_sem_form)` lines loaded.

### Create the codebook for *receitas_por_tempo_sem_form* database

```{r}
#| warning: false
#| message: false
#| label: code-book-2

descriptions_2 <- c('Election year',
                    'Position description',
                    'State acronym',
                    'Revenue source description',
                    'Resource source description',
                    'Revenue kind description',
                    'Revenue date',
                    'Candidate sequential',
                    'Candidate ID',
                    'Total revenue')

code_book_2 <- data.frame(
  variable_name = names(receitas_por_tempo_sem_form),
  variable_class = sapply(receitas_por_tempo_sem_form, class),
  variable_desc = descriptions_2)

code_book_2 %>% 
  select(variable_name, 
         variable_class, 
         variable_desc)
```

Save codebook 2

```{r}
#| warning: false
#| message: false
#| label: save-codebook-2

write.xlsx(code_book_2, 
           '..//Dados//Code_Book_2.xlsx',
           sheetName = "codebook2")
```

## Candidacy data

It loads the database of candidates in 2018 and 2022 and shows the overall total of candidates -- in this case, no filter was applied, therefore, candidates classified as "irregular" by the TSE were also considered.

### Loading candidacy data

```{r}
#| warning: false
#| message: false
#| label: candidacy-data-loading

nome_base_candidatos <- "../Dados/candidatos_2018_2022_v_2023_09_27.dsv"

candidatos_sem_form <- read.csv(
  nome_base_candidatos,
  header = TRUE,
  sep = ";",                     
  fileEncoding = "utf-8",
  stringsAsFactors = FALSE,
  colClasses=c("NR_CPF_CANDIDATO"="character"))
```

⟶ `{r} nrow(candidatos_sem_form)` lines loaded.

```{r}
#| warning: false
#| message: false
#| label: code-book-3

descriptions_3 <- c("Election year",
                    "State name",
                    "State acronym",
                    "Position description",
                    "Party acronyn",
                    "Candidate name",
                    "Candidate's ballot box name",
                    "Candidate sequential ID",
                    "Candidate ID",
                    "Candidacy status description",
                    "Candidacy detailed status description",
                    "Vote tally status",
                    "Nationality of the candidate",
                    "Acronym of place (state) of birth",
                    "Date of birth",
                    "Age at the time of taking office",
                    "Vote registration number",
                    "Candidate gender description",
                    "Education level description",
                    "Race code",
                    "Race description",
                    "Occupation code",
                    "Occupation description",
                    'Reelection status')

code_book_3 <- data.frame(
  variable_name = names(candidatos_sem_form),
  variable_class = sapply(candidatos_sem_form, class),
  variable_desc = descriptions_3)

code_book_3
```

Save codebook 3

```{r}
#| warning: false
#| message: false
#| label: save-codebook-3

write.xlsx(code_book_3, 
           '..//Dados//Code_Book_3.xlsx',
           sheetName = "codebook3")
```

## Data preparation

### Standardizing variable names

We made some changes to the names of the original TSE variables, as well as some aggregations and formatting to simplifly the analysis. Both the original and new names are kept in Brazilian Portuguese.

+--------------------+--------------------+------------------------------------------------------------------------------+
| Original name      | New name           | Other modifications                                                          |
+====================+====================+==============================================================================+
| ANO_ELEICAO        | ano_eleicao        | --                                                                           |
+--------------------+--------------------+------------------------------------------------------------------------------+
| SG_UF              | uf                 | --                                                                           |
+--------------------+--------------------+------------------------------------------------------------------------------+
| DT_RECEITA         | data_receita       | Conversion to format dd/mm/aaaa                                              |
+--------------------+--------------------+------------------------------------------------------------------------------+
| DS_CARGO           | cargo              | --                                                                           |
+--------------------+--------------------+------------------------------------------------------------------------------+
| DS_GENERO          | genero             | Removing blanking spaces                                                     |
+--------------------+--------------------+------------------------------------------------------------------------------+
| DS_COR_RACA        | cor_raca           | If DS_COR_RACA = 'PARDA' or 'PRETA' then cor_raca = 'Blak'                   |
|                    |                    |                                                                              |
|                    |                    | If DS_COR_RACA = 'BRANCA' then cor_raca = 'White'                            |
|                    |                    |                                                                              |
|                    |                    | Else cor_raca = 'Other races'                                                |
+--------------------+--------------------+------------------------------------------------------------------------------+
| DS_SIT_TOT_TURNO   | eleito             | If DS_SIT_TURNO = 'Eleito%' then eleito = 'Elected'                          |
|                    |                    |                                                                              |
|                    |                    | Else eleito = 'Not elected'                                                  |
+--------------------+--------------------+------------------------------------------------------------------------------+
| --                 | tipo_fonte_receita | If fonte_receita = 'FUNDO ESPECIAL' then tIpo_fonte_receita = 'FEFC'         |
|                    |                    |                                                                              |
|                    |                    | If fonte_receita = 'FUNDO PARTIDARIO' then tipo_fonte_receita = 'FP'         |
|                    |                    |                                                                              |
|                    |                    | Else tipo_fonte_receita = 'Outros recursos'                                  |
+--------------------+--------------------+------------------------------------------------------------------------------+
| DS_ESPECIE_RECEITA | especie_receita    | If DS_ESPECIE_RECEITA = 'Estimado' então especie_receita = 'Cash-equivalent' |
|                    |                    |                                                                              |
|                    |                    | Then especie_receita = 'Cash'                                                |
+--------------------+--------------------+------------------------------------------------------------------------------+

### Standardizing campaign revenue data by time

```{r}
#| warning: false
#| message: false
#| label: campaign-revenue-data-standardization

# prepara dados de receitas para a análise
receitas_por_tempo <- receitas_por_tempo_sem_form %>%
  mutate(fonte_receita = str_to_upper(DS_FONTE_RECEITA)) %>%
  mutate(
    ano_eleicao = as.factor(ANO_ELEICAO),
    cargo = str_to_upper(DS_CARGO),
    uf = as.factor(SG_UF),
    sq_candidato = SQ_CANDIDATO,
    cpf_candidato = as.character(NR_CPF_CANDIDATO),
    data_receita = as.Date(DT_RECEITA, "%d/%m/%Y"),
    tipo_fonte_receita = case_when(
      fonte_receita == 'FUNDO ESPECIAL' ~ 'FEFC',
      fonte_receita == 'FUNDO PARTIDARIO' ~ 'FP',
      TRUE ~ 'Outros recursos'
    ),
    especie_receita = case_when(
      DS_ESPECIE_RECEITA == 'Estimado' ~ 'Cash-equivalent',
      TRUE ~ 'Cash'),
    total_receita_especie = TOTAL_RECEITA
  )

rm(receitas_por_tempo_sem_form)
```

⟶ `{r} nrow(receitas_por_tempo)` lines standardized.

### Standardizing candidacies data

```{r}
#| warning: false
#| message: false
#| label: candidacies-data-standardization

# prepara dados de candidatos para a análise
candidatos <- candidatos_sem_form %>%
  filter(DS_SITUACAO_CANDIDATURA == "APTO") %>%
  mutate(
    ano_eleicao = as.factor(ANO_ELEICAO),
    uf = as.factor(SG_UF),
    cargo = str_to_upper(DS_CARGO_CONCORRIDO),
    partido = SG_PARTIDO,
    cpf_candidato = as.character(NR_CPF_CANDIDATO),
    sq_candidato = SQ_CANDIDATO,
    genero = case_when(
      str_trim(DS_GENERO) == 'FEMININO' ~ 'Women',
      str_trim(DS_GENERO) == 'MASCULINO' ~ 'Men',
      TRUE ~ 'Others'
    ),
    cor_raca = case_when(
      DS_COR_RACA == 'PARDA' | DS_COR_RACA == 'PRETA' ~ 'Black',
      DS_COR_RACA == 'BRANCA' ~ 'White',
      TRUE ~ 'Other races'
    ),
    eleito = case_when(
      DS_SIT_TOT_TURNO %like% 'ELEITO%' ~ 'Elected',
      TRUE ~ 'Not elected'
    ),
    reeleicao = case_when(
      str_trim(ST_REELEICAO) == 'N' ~ 'Challenger',
      TRUE ~ 'Reelection'
    )
  )

rm(candidatos_sem_form)
```

⟶ `{r} nrow(candidatos)` lines standardized.

### Data aggregation

#### Step 1 - Combining campaign revenue and candidate data

In this step, we join revenue and candidacies data to calculate the proportion of candidates. This is necessary because not all candidates necessarily submit financial statements, as they may have their candidacy registration denied or even withdraw their candidacy.

```{r}
#| warning: false
#| message: false
#| label: revenues-candidacies-join

receitas_candidatos_por_tempo <- candidatos %>%
  filter(
    ano_eleicao %in% c(2018, 2022), 
    cargo %in% c('DEPUTADO DISTRITAL', 
                 'DEPUTADO ESTADUAL', 
                 'DEPUTADO FEDERAL')) %>%
  select(
    ano_eleicao, 
    uf, 
    cargo, 
    partido, 
    sq_candidato,
    cpf_candidato, 
    genero, 
    cor_raca,
    eleito,
    reeleicao) %>% 
  left_join(
    receitas_por_tempo,
    by = c('ano_eleicao', 'cargo', 'uf', 'sq_candidato', 'cpf_candidato'),
    suffix = c("", ".y")) %>%
  select(-ends_with(".y")) %>%
  select(
    ano_eleicao,
    uf,
    cargo, 
    partido,
    sq_candidato,
    cpf_candidato, 
    genero, 
    cor_raca, 
    eleito,
    reeleicao,
    fonte_receita,
    data_receita,
    tipo_fonte_receita,
    especie_receita,
    total_receita_especie
  ) %>%
  na.omit()
```

⟶ `{r} nrow(receitas_candidatos_por_tempo)` lines generated.

#### Step 2 - Selects and aggregates public revenue data by gender and race

In this step, we aggregate public revenue data from both the 2018 and 2020 General Elections by gender and race.

```{r}
#| warning: false
#| message: false
#| label: public-revenue-by-gender-and-race

# applies monetary correction factor: 2018-october => 2022-october
fator_correcao <- 1.63171680

receitas_publicas <- receitas_candidatos_por_tempo %>%
  filter(
    ano_eleicao %in% c(2018, 2022), 
    cargo %in% c('DEPUTADO DISTRITAL', 
                 'DEPUTADO ESTADUAL', 
                 'DEPUTADO FEDERAL'),
    tipo_fonte_receita %in% c('FEFC', 'FP'),
    total_receita_especie < 20000002.89) %>% # exclui entrada outlier
  mutate(
    receita = ifelse(
      ano_eleicao == 2018, 
      total_receita_especie * fator_correcao, 
      total_receita_especie)) %>%
  group_by(
    ano_eleicao,
    uf,
    genero,
    cor_raca,
    eleito,
    cargo,
    reeleicao,
    partido,
    cpf_candidato,
    tipo_fonte_receita) %>%
  summarise(
    rec_cand = sum(total_receita_especie), 
    na.rm = TRUE) %>%
  arrange(ano_eleicao, cargo, uf, genero, cor_raca)
```

⟶ `{r} nrow(receitas_publicas)` lines generated.

#### Step 3 - Selects and aggregates public revenue data by source and week

In this step, we aggregate public revenue data from both the 2018 and 2020 General Elections by source and week.

```{r}
#| warning: false
#| message: false
#| label: public-revenue-by-source-and-week

receitas_publicas_semana <- receitas_candidatos_por_tempo %>%
  filter(
    ano_eleicao %in% c(2018, 2022), 
    cargo %in% c('DEPUTADO DISTRITAL', 
                 'DEPUTADO ESTADUAL', 
                 'DEPUTADO FEDERAL'),
    tipo_fonte_receita %in% c('FEFC', 'FP'),
    total_receita_especie < 20000002.89) %>% # exclui entrada outlier
  mutate(
    receita = ifelse(
      ano_eleicao == 2018, 
      total_receita_especie * fator_correcao, 
      total_receita_especie),
    semana_receita = as.Date(
      cut(data_receita, breaks = 'week', 
          start.on.monday = FALSE)),) %>%
  group_by(
    ano_eleicao,
    uf,
    genero,
    cor_raca,
    eleito,
    cargo,
    reeleicao,
    partido,
    cpf_candidato,
    tipo_fonte_receita,
    semana_receita) %>%
  summarise(rec_cand = sum(total_receita_especie))
```

⟶ `{r} nrow(receitas_publicas_semana)` lines generated.

#### Step 4 - Filter only public revenue data and aggregate them by source, kind and week

In this step, we aggregate public revenue data from both the 2018 and 2020 Brazilian General Elections by source and kind of campaign revenue.

Candidate level data

```{r}
#| warning: false
#| message: false
#| label: public-revenue-by-source-and-kind-clevel

receitas_publicas_especie <- receitas_candidatos_por_tempo %>%
  filter(
    ano_eleicao %in% c(2018, 2022), 
    cargo %in% c('DEPUTADO DISTRITAL', 
                 'DEPUTADO ESTADUAL', 
                 'DEPUTADO FEDERAL'),
    tipo_fonte_receita %in% c('FEFC', 'FP'),
    total_receita_especie < 20000002.89) %>% # exclui entrada outlier
  mutate(
    receita = ifelse(
      ano_eleicao == 2018, 
      total_receita_especie * fator_correcao, 
      total_receita_especie)) %>%
  group_by(
    ano_eleicao,
    uf,
    genero,
    cor_raca,
    eleito,
    cargo,
    reeleicao,
    partido,
    cpf_candidato,
    tipo_fonte_receita,
    especie_receita) %>%
  summarise(rec_cand = sum(total_receita_especie))
```

⟶ `{r} nrow(receitas_publicas_especie)` lines generated.

Kind of revenue level data

```{r}
#| warning: false
#| message: false
#| label: public-revenue-by-source-and-kind-klevel

# agrega receitas públicas por eleição, gênero, cor/raça e espécie
receitas_publicas_especie_agr <- receitas_publicas_especie %>%
  mutate(cargo = ifelse(cargo == "DEPUTADO DISTRITAL", 
                        "DEPUTADO ESTADUAL", 
                        cargo)) %>%
  group_by(ano_eleicao, cargo, genero, cor_raca) %>%
  mutate(total_rec_ano_cargo_genero_cor = sum(rec_cand, na.rm = TRUE)) %>%
  ungroup() %>%
  group_by(ano_eleicao, cargo, genero, cor_raca, especie_receita) %>%
  summarise(
    total_rec_ano_cargo_genero_cor_especie = sum(rec_cand, na.rm = TRUE), 
    perc_rec_ano_cargo_genero_cor_especie = total_rec_ano_cargo_genero_cor_especie / max(total_rec_ano_cargo_genero_cor),
    .groups = "drop"
  ) %>%
  arrange(ano_eleicao, cargo, genero, especie_receita, cor_raca)
```

⟶ `{r} nrow(receitas_publicas_especie_agr)` lines generated.

#### Step 5 - Aggregate public revenues by election, week, gender and race - federal deputy candidates

In this step, we aggregate public revenue data from both the 2018 and 2020 Brazilian General Elections by year of election, week, gender and race, for federal deputy candidates.

```{r}
#| warning: false
#| message: false
#| label: public-revenue-by-week-federal-deputy

receitas_publicas_semana_dep_fed <- receitas_publicas_semana %>%
  filter(cargo == 'DEPUTADO FEDERAL') %>%
  group_by(ano_eleicao) %>%
  mutate(total_cand_ano = n_distinct(cpf_candidato)) %>%
  ungroup() %>%
  group_by(ano_eleicao, genero, cor_raca) %>%
  mutate(total_cand_cor = n_distinct(cpf_candidato)) %>%
  ungroup() %>%
  group_by(ano_eleicao, semana_receita) %>%
  mutate(
    total_receitas_semana = sum(rec_cand, na.rm = TRUE),
    media_receitas_semana = sum(rec_cand, na.rm = TRUE) / total_cand_ano) %>%
  ungroup() %>%
  group_by(ano_eleicao, semana_receita, genero, cor_raca) %>%
  mutate(
    total_receitas_cor = sum(rec_cand, na.rm = TRUE),
    media_receitas_genero_cor = sum(rec_cand, na.rm = TRUE) / total_cand_cor) %>%
  ungroup() %>%
  arrange(ano_eleicao, semana_receita, genero, cor_raca)
```

⟶ `{r} nrow(receitas_publicas_semana_dep_fed)` lines generated (federal deputy).

#### Step 6 - Aggregate public revenues per week - state deputy candidates

In this step, we aggregate public revenue data from both the 2018 and 2020 Brazilian General Elections by year of election, week, gender and race, for state deputy candidates.

```{r}
#| warning: false
#| message: false
#| label: public-revenue-by-week-state-deputy

# agrega receitas públicas por eleição, semana, gênero, cor/raça e situaçao de reeleição
receitas_publicas_semana_dep_est <- receitas_publicas_semana %>%
  filter(cargo %in% c('DEPUTADO DISTRITAL', 'DEPUTADO ESTADUAL')) %>%
  group_by(ano_eleicao) %>%
  mutate(total_cand_ano = n_distinct(cpf_candidato)) %>%
  ungroup() %>%
  group_by(ano_eleicao, genero, cor_raca) %>%
  mutate(total_cand_cor = n_distinct(cpf_candidato)) %>%
  ungroup() %>%
  group_by(ano_eleicao, semana_receita) %>%
  mutate(
    total_receitas_semana = sum(rec_cand, na.rm = TRUE),
    media_receitas_semana = sum(rec_cand, na.rm = TRUE) / total_cand_ano) %>%
  ungroup() %>%
  group_by(ano_eleicao, semana_receita, genero, cor_raca) %>%
  mutate(
    total_receitas_cor = sum(rec_cand, na.rm = TRUE),
    media_receitas_genero_cor = sum(rec_cand, na.rm = TRUE) / total_cand_cor) %>%
  ungroup() %>%
  arrange(ano_eleicao, semana_receita, genero, cor_raca)
```

⟶ `{r} nrow(receitas_publicas_semana_dep_est)` lines generated (state deputy).

#### Step 7 - Calculate minimum and maximum values of public revenue

In this step, we identify both the minimum and maximum values of public revenue values and the minimum and maximum dates for constructing the graph scales.

```{r}
#| warning: false
#| message: false
#| label: max-public-revenue

sum_receitas <- aggregate(
  rec_cand ~ ano_eleicao + genero + semana_receita, 
  data = receitas_publicas_semana, sum)
sum_receitas <- tibble(sum_receitas) %>%
  arrange(ano_eleicao, genero, semana_receita)
max_receita <- max(sum_receitas$rec_cand)

# foram usadas as datas da convenção partidária
min_data_convencao_2018 <- c("2018-07-20", "2018-10-06")
min_data_convencao_2022 <- c("2022-07-20", "2022-10-01")
min_data_campanha_2018 <- c("2018-08-13", "2018-10-06")
min_data_campanha_2022 <- c("2022-08-16", "2022-10-01")

# funçao para gravar uma figura em alta resoluçao
grava_figura <- function(nome, formato) {
  Cairo::Cairo(
    30,
    20,
    file = paste(nome, formato, sep = "."),
    type = formato,
    bg = "transparent",
    dpi = 600,
    units = "cm"
  )  
}
```

-   Maximum public revenue value for a single week = `{r} max_receita`

-   Convention date for 2018 Election = `{r} min_data_convencao_2018`

-   Convention date for 2022 Election = `{r} min_data_convencao_2022`

# Data Analysis

## Preliminary Analysis - total resources transfers between candidate providers, by resource type

```{r}
#| warning: false
#| message: false
#| label: total-resource-transfers-candidate-providers-by-resource-type

receitas %>%
  filter(ano_eleicao == 2022,
         NR_CPF_CNPJ_DOADOR != NR_CNPJ_PRESTADOR_CONTA,
         DS_FONTE_RECURSO %in% 
           c('Recursos de outros candidatos', 
             'Recursos de partido político')) %>%
  group_by(tipo_recurso) %>%
  summarise(transferencias_entre_prestadores = sum(TOTAL_RECEITA),
            .groups = 'drop')
```

## **Distribution of Cash-Equivalent and Cash Funding**

::: callout-note
In this work, we consider only funding originating from public revenue.
:::

### Graph 1 - Proportion of cash-equivalent funds of candidates for federal deputy, by race and gender

Create data object

```{r}
#| warning: false
#| message: false
#| label: create-g1-data

g1_data <- receitas_publicas_especie_agr %>% 
    filter(
      ano_eleicao %in% c(2018, 2022),
      cor_raca %in% c("White", "Black"),
      cargo == 'DEPUTADO FEDERAL',
      especie_receita == 'Cash-equivalent') %>%
      mutate(
      ano_eleicao = as.factor(ano_eleicao),
      cor_raca = factor(cor_raca, levels = c("White", "Black"))
  )
```

Plot figure

```{r}
#| warning: false
#| message: false
#| class: output
#| label: fig-1
#| fig-cap: "Proportion of cash-equivalent funding of candidates to federal deputy, in 2018 and 2022, by gender and race"
#| fig-alt: "Proportion of cash-equivalent funding of candidates to federal deputy, in 2018 and 2022, by gender and race"
#| fig-width: 8
#| fig-height: 6

# identifies the minimum and maximum revenue, as well as the minimum and maximum dates to build the chart scale
max_receita <- max(g1_data$perc_rec_ano_cargo_genero_cor_especie)

#limites <- c(0, max_receita) * 2
#quebras <- round(seq(limites[1], limites[2], by = max_receita / 10), 
#                 digits = 2)

# 2018 and 2022
g1_figure <- ggplot(
  data = g1_data, 
  aes(fill = genero, 
      x = as.factor(cor_raca), 
      y = perc_rec_ano_cargo_genero_cor_especie)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(
    aes(label=scales:::percent(perc_rec_ano_cargo_genero_cor_especie, accuracy=0.1)), position=position_dodge(width = 0.9), 
    vjust= -0.5, 
    size= 4
    )+
  facet_wrap(~ano_eleicao) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1") +
  scale_y_continuous(
    labels = scales::percent,
   # limits = limites,
  #  breaks = quebras) +
     limits = c(0, 0.10), #0% a 10%
      breaks = seq(0, 0.10, by = 0.02)) +

  labs(
    x = "Race/color", 
    y = "Proportion of cash-equivalent funding (%)") +
  theme(
  panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(size = 14), 
    axis.text.y = element_text(size = 14), 
    axis.title.x = element_text(size = 15), 
    axis.title.y = element_text(size = 15), 
    strip.text = element_text(size = 15),    
    legend.title = element_blank(), 
    legend.text = element_text(size = 11)) 

g1_figure
```

Save figure

```{r}
#| warning: false
#| message: false
#| label: save-fig-1

grava_figura("..//Figuras//paper-fig-1", "png")
g1_figure
dev.off()
```


### Graph 2 - Proportion of cash-equivalent funds of candidates for state deputy, by race and gender

Create data object

```{r}
#| warning: false
#| message: false
#| label: create-g2-data

g2_data <- receitas_publicas_especie_agr %>% 
    filter(
      ano_eleicao %in% c(2018, 2022),
      cor_raca %in% c("White", "Black"),
      cargo == "DEPUTADO ESTADUAL",
      especie_receita == "Cash-equivalent") %>%
  mutate(
    ano_eleicao = as.factor(ano_eleicao),
    cor_raca = factor(cor_raca, levels = c("White", "Black"))
  )
```

Plot figure

```{r}
#| warning: false
#| message: false
#| class: output
#| label: fig-2
#| fig-cap: "Proportion of cash-equivalent funding of candidates to state deputy, in 2018 and 2022, by gender and race"
#| fig-alt: "Proportion of cash-equivalent funding of candidates to state deputy, in 2018 and 2022, by gender and race"
#| fig-width: 8
#| fig-height: 6

# identifies the minimum and maximum revenue, as well as the minimum and maximum dates to build the chart scale
max_receita <- max(g2_data$perc_rec_ano_cargo_genero_cor_especie)

limites <- c(0, max_receita) * 1.01
quebras <- seq(limites[1], limites[2], by = max_receita / 10)

# 2018 and 2022
g2_figure <- ggplot(
  data = g2_data, 
  aes(fill = genero, 
      x = as.factor(cor_raca), 
      y = perc_rec_ano_cargo_genero_cor_especie)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(
   aes(label = scales::percent(perc_rec_ano_cargo_genero_cor_especie, accuracy = 0.1)),
    position = position_dodge(width = 0.9),
    vjust = -0.5,
    size = 4
  ) +
  facet_wrap(~ano_eleicao) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set1") +
  scale_y_continuous(
    labels = scales::percent,
 #   limits = limites,
#    breaks = quebras) +
  limits = c(0, 0.20), #0% a 22%
    breaks = seq(0, 0.20, by = 0.02)) +
  labs(
    x = "Race/color", 
    y = "Proportion of cash-equivalent funding (%)") +
  theme(
    #panel.background = element_rect(fill='transparent'),
    #plot.background = element_rect(fill='transparent', color=NA),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(size = 14), #RAÇAS
    axis.text.y = element_text(size = 14), # % 0 A 100%
    axis.title.x = element_text(size = 15), # legenda horizontal
    axis.title.y = element_text(size = 15), # legenda vertical
    strip.text = element_text(size = 15),    
    legend.title = element_blank(), 
    legend.text = element_text(size = 11))


g2_figure
```

Save figure

```{r}
#| warning: false
#| message: false
#| label: save-fig-2

grava_figura("..//Figuras//paper-fig-2", "png")
g2_figure
dev.off()
```


## When the funds were transferred

### Graph 3 - Ratio between the group and the overall average of public funding of candidates for federal deputy by gender and race, each week

::: callout-note
The amount of public revenue for the week received by the group (gender, race) weighted by the number of applications is equivalent to the average public revenue for that week.
:::

Create data object

```{r}
#| warning: false
#| message: false
#| label: create-g3-data

# Federal Deputy
g3_data <-  receitas_publicas_semana_dep_fed %>% 
  filter_at(vars(everything()), all_vars(!is.na(.))) %>%
  filter(
    genero %in% c("Men", "Women"),
    cor_raca %in% c('White', 'Black'))

# para alterar a ordem das barras, para ficar como na versão em português
g3_data <- g3_data %>%
  mutate(
    genero_cor = factor(
      paste(genero, cor_raca, sep = "."),
      levels = c("Men.White", "Men.Black", "Women.White", "Women.Black")
    )
  )

max_media_receita_a <- max(g3_data$media_receitas_genero_cor)

```

Plot figure (Federal Deputy)

```{r}
#| warning: false
#| message: false
#| class: output
#| label: fig-3
#| fig-cap: "Ratio between the group average and the overall average of public funding of candidates for federal deputy by gender and color/race, each week"
#| fig-alt: "Ratio between the group average and the overall average of public funding of candidates for federal deputy by gender and color/race, each week"
#| fig-width: 10
#| fig-height: 8

# Definir idioma para visualizar data em inglês 
invisible(Sys.setlocale("LC_TIME", "English"))

limites <- c(0, max_media_receita_a) * 1.1
quebras <- seq(limites[1], limites[2], by = 20000)


# 2018
g3_2018_figure <- ggplot(
  data = g3_data %>% 
    filter(ano_eleicao == 2018),
  aes(
#    fill = interaction(genero, cor_raca, lex.order = TRUE),
     fill = genero_cor,
     x = semana_receita, 
      y = media_receitas_genero_cor)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(
    aes(label = paste0(
      "(",
      sprintf("%0.1f", round(media_receitas_genero_cor / media_receitas_semana, 1)),
      ")"
  #   , sep = "")
    )),
    stat = "identity",
    size = 3,
    position = position_dodge(6),
    vjust = -2) +
  scale_x_date(
    limits = as.Date(min_data_campanha_2018),
    labels = date_format("%b-%d"), 
    breaks = "1 week") +
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    limits = limites, breaks = quebras) +
  labs(
    title = "2018",
    x = "Month - First day of the week",
    y = "Average funding by group (BRL)",
    fill = "Gender.Race") +
  theme(panel.background = element_rect(fill = "transparent"),
        panel.grid.major.x = element_blank(), 
        panel.grid.major.y = element_line(linewidth = 0.1, 
                                          colour = "grey50"),
        panel.grid.minor = element_blank(),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        axis.title.x = element_text(size = 12),
        axis.title.y = element_text(size = 11),
        plot.title = element_text(size = 12, face = "bold"),
        legend.title = element_blank(),
        legend.position = "bottom")

# 2022
g3_2022_figure <- ggplot(
  data = g3_data %>% 
    filter(ano_eleicao == 2022),
  aes(
  #  fill = interaction(genero, cor_raca, lex.order = TRUE),
    fill = genero_cor, 
    x = semana_receita, 
    y = media_receitas_genero_cor)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(
    aes(label = paste(
      "(",
      sprintf("%0.1f", 
              round(media_receitas_genero_cor / media_receitas_semana, 1)), ")",
      sep = "")),
    stat = "identity",
    size = 3,
    position = position_dodge(6),
    vjust = -2) +
  scale_x_date(
    limits = as.Date(min_data_campanha_2022),
    labels = date_format("%b-%d"), 
    breaks = "1 week") +
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    limits = limites, breaks = quebras) +
  labs(
    title = "2022",
    x = "Month - First day of the week",
    y = "Average funding by group (BRL)",
    fill = "Gender.Race") +
  theme(panel.background = element_rect(fill = "transparent"),
        panel.grid.major.x = element_blank(), 
        panel.grid.major.y = element_line(linewidth = 0.1, 
                                          colour = "grey50"),
        panel.grid.minor = element_blank(),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        axis.title.x = element_text(size = 12),
        axis.title.y = element_text(size = 11),
        plot.title = element_text(size = 12, face = "bold"),
        legend.title = element_blank(),
        legend.position = "bottom") +
  geom_vline(
    xintercept = as.Date("2022-09-13"),
    lwd = 1,
    linetype = 3,
    colour="black") +
  annotate("text", x=as.Date("2022-09-12"), y=700000000, 
           label="1st. partial", angle=90) 

g3_figure <- g3_2018_figure + g3_2022_figure + plot_layout(ncol = 1, 
                                                           nrow = 2)
g3_figure
```

Save figure (Federal Deputy)

```{r}
#| warning: false
#| message: false
#| label: save-fig-3

grava_figura("..//Figuras//paper-fig-3", "png")
g3_figure
dev.off()
```



### Graph 4 - Ratio between the group and the overall average of public funding of candidates for state deputy by gender and color/race, each week

::: callout-note
The amount of public revenue for the week received by the group (gender, race) weighted by the number of applications is equivalent to the average public revenue for that week.
:::

Create data object

```{r}
#| warning: false
#| message: false
#| label: create-g4-data

# State Deputy
g4_data <-  receitas_publicas_semana_dep_est %>% 
  filter_at(vars(everything()), all_vars(!is.na(.))) %>%
  filter(
    genero %in% c("Men", "Women"),
    cor_raca %in% c('White', 'Black'))


# para alterar a ordem das barras, para ficar como na versão em português
g4_data <- g4_data %>%
  mutate(
    genero_cor = factor(
      paste(genero, cor_raca, sep = "."),
      levels = c("Men.White", "Men.Black", "Women.White", "Women.Black")
    )
  )


max_media_receita_b <- max(g4_data$media_receitas_genero_cor)
```

Plot figure (State Deputy)

```{r}
#| warning: false
#| message: false
#| class: output
#| label: fig-4
#| fig-cap: "Ratio between the group average and the overall average of public funding of candidates for state deputy by gender and race, each week"
#| fig-alt: "Ratio between the group average and the overall average of public funding of candidates for state deputy by gender and race, each week"
#| fig-width: 10
#| fig-height: 8


# Definir idioma para visualizar data em inglês 
invisible(Sys.setlocale("LC_TIME", "English"))


limites <- c(0, max_media_receita_b) * 1.1
quebras <- seq(limites[1], limites[2], by = 20000)


# 2018
g4_2018_figure <- ggplot(
  data = g4_data %>% filter(ano_eleicao == 2018),
  aes(
    #fill = interaction(genero, cor_raca, lex.order = TRUE),
    fill= genero_cor,
      x = semana_receita, 
      y = media_receitas_genero_cor)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(
    aes(label = paste0(
      "(",
      sprintf("%0.1f", 
              round(media_receitas_genero_cor / media_receitas_semana, 1)),
      ")"
    # , sep = "")),
    )),
    stat = "identity",
    size = 3,
    position = position_dodge(6),
    vjust = -2) +
  scale_x_date(
    limits = as.Date(min_data_campanha_2018),
    labels = date_format("%b-%d"), 
    breaks = "1 week") +
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    limits = limites, breaks = quebras) +
  labs(
    title = "2018",
    x = "Month - First day of the week",
    y = "Average funding by group (BRL)",
    fill = "Gender.Race") +
  theme(panel.background = element_rect(fill = "transparent"),
        panel.grid.major.x = element_blank(), 
        panel.grid.major.y = element_line(linewidth = 0.1, 
                                          colour = "grey50"),
        panel.grid.minor = element_blank(),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        axis.title.x = element_text(size = 12),
        axis.title.y = element_text(size = 11),
        plot.title = element_text(size = 12, face = "bold"),
        legend.title = element_blank(),
        legend.position = "bottom")


# 2022
g4_2022_figure <- ggplot(
  data = g4_data %>% filter(ano_eleicao == 2022),
  aes(
  #  fill = interaction(genero, cor_raca, lex.order = TRUE),
    fill = genero_cor,
    x = semana_receita, 
    y = media_receitas_genero_cor)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_text(
    aes(label = paste(
      "(",
      sprintf("%0.1f", 
              round(media_receitas_genero_cor / media_receitas_semana, 1)),
      ")",
      sep = "")),
    stat = "identity",
    size = 3,
    position = position_dodge(6),
    vjust = -2) +
  scale_x_date(
    limits = as.Date(min_data_campanha_2022),
    labels = date_format("%b-%d"), 
    breaks = "1 week") +
  scale_y_continuous(
    labels = scales::label_number(scale_cut = scales::cut_short_scale()),
    limits = limites, breaks = quebras) +
  labs(
    title = "2022",
    x = "Month - First day of the week",
    y = "Average funding by group (BRL)",
    fill = "Gender.Race") +
  theme(panel.background = element_rect(fill = "transparent"),
        panel.grid.major.x = element_blank(), 
        panel.grid.major.y = element_line(linewidth = 0.1, 
                                          colour = "grey50"),
        panel.grid.minor = element_blank(),
        axis.text.x = element_text(size = 10),
        axis.text.y = element_text(size = 10),
        axis.title.x = element_text(size = 12),
        axis.title.y = element_text(size = 11),
        plot.title = element_text(size = 12, face = "bold"),
        legend.title = element_blank(),
        legend.position = "bottom") +
  geom_vline(
    xintercept = as.Date("2022-09-13"),
    lwd = 1,
    linetype = 3,
    colour="black") +
  annotate("text", x=as.Date("2022-09-12"), y=700000000, 
           label="1a. parcial", angle=90) 

g4_figure <- g4_2018_figure + g4_2022_figure + plot_layout(ncol = 1, 
                                                           nrow = 2)
g4_figure
```

Save figure (State Deputy)

```{r}
#| warning: false
#| message: false
#| label: save-fig-4

# save figure in PNG format
grava_figura("..//Figuras//paper-fig-4", "png")
g4_figure
dev.off()

```



## Transfers between candidate providers

Define the format style of the tables

```{r}
#| warning: false
#| message: false
#| label: format-table-styles

set_flextable_defaults(
  split = TRUE,
  table_align = "center",
  digits = 2, 
  border.color = "black")
properties1 <- fp_text(font.family = "Times New Roman",
                      font.size = 10)
properties2 <- fp_text(font.family = "Times New Roman",
                      font.size = 10,
                      bold = TRUE)
```

### Table 1 - Transfer of funds between candidates, by gender

::: callout-note
Here, we consider only funding originating from public revenue.
:::

Create dataset for table 1

```{r}
#| warning: false
#| message: false
#| label: create-data-table-1

d_t_1 <- receitas %>%
  filter(ano_eleicao == 2022,
         NR_CPF_CNPJ_DOADOR != NR_CNPJ_PRESTADOR_CONTA,
         DS_FONTE_RECURSO %in% c('Recursos de outros candidatos', 
                                 'Recursos de partido político'),
         tipo_recurso == 'Public',
         genero %in% c('Men', 'Women'),
         genero_doador %in% c('Men', 'Women')) %>%
  mutate(year_election = ano_eleicao,
         position = DS_CARGO,
         gender_accountaunt = genero,
         gender_donor = genero_doador) %>%
  select(-c(ano_eleicao, DS_CARGO, genero, genero_doador)) %>%
  group_by(gender_donor, gender_accountaunt) %>%
  summarise(transfers_between_accountants = sum(TOTAL_RECEITA),
            .groups = 'drop') %>%
  ungroup() %>%
  group_by(gender_donor) %>%
  mutate(
    prc_transfers_rel_gender_donor = transfers_between_accountants / 
        sum(transfers_between_accountants)) %>%
  ungroup() %>%
  mutate(
      prc_transfers_rel_total = transfers_between_accountants / 
        sum(transfers_between_accountants)) %>%
  select(gender_donor, gender_accountaunt, 
         transfers_between_accountants,
         prc_transfers_rel_total,
         prc_transfers_rel_gender_donor) %>%
  as_tibble()

d_t_1
```

Plot table 1 (HTML format)

```{r}
#| warning: false
#| message: false
#| label: plot-table-1

t_1 <- as_flextable(d_t_1) |>
  delete_part("foot") |>
  delete_rows(i = 2, part = "header") |>
  mk_par(
    j = 4, part = "body",
    value = as_paragraph(as_chunk(prc_transfers_rel_total,
                                  formatter = fmt_pct))) |>
    mk_par(
    j = 5, part = "body",
    value = as_paragraph(as_chunk(prc_transfers_rel_gender_donor,
                                  formatter = fmt_pct))) |>
  labelizor(
    part = "header",
    labels = c(
      "gender_accountaunt" = "Accountant gender",
      "gender_donor" = "Donor gender",
      "transfers_between_accountants" = "Transfer amount (BRL)",
      "prc_transfers_rel_total" = "% (rel. total)",
      "prc_transfers_rel_gender_donor" = "% (rel. donor gender)")) |>
  align(align = "center", part = "all") |>
  width(width = 1.25) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_1
```

Save table 1 (XLSX and DOCX format)

```{r}
#| warning: false
#| message: false
#| label: save-table-1

# XLSX
write.xlsx(d_t_1, 
           '..//Tabelas//Paper_Table_1_Dataset.xlsx')

# DOCX
path <- '..//Tabelas/Paper_Table_1_R.docx'
doc <- read_docx()
doc <- body_add_flextable(doc, value=t_1)
print(doc, target = path)
```

### Table 2 - Transfer of funds between candidates, by race

::: callout-note
Here, we consider only funding originating from public revenue. The candidate who donates is called the donor, and the candidate who receives the donation is called the accountant.
:::

Create dataset for table 2

```{r}
#| warning: false
#| message: false
#| label: create-data-table-2

d_t_2 <- receitas %>%
  filter(ano_eleicao == 2022,
         NR_CPF_CNPJ_DOADOR != NR_CNPJ_PRESTADOR_CONTA,
         DS_FONTE_RECURSO %in% c('Recursos de outros candidatos', 
                                 'Recursos de partido político'),
         tipo_recurso == 'Public',
         cor_raca_doador %in% c('White', 'Black'),
         cor_raca %in% c('White', 'Black')) %>%
  mutate(year_election = ano_eleicao,
         position = DS_CARGO,
         gender = genero,
         gender_donor = genero_doador,
         race_donor = cor_raca_doador,
         race_accountant = cor_raca) %>%
  select(-c(ano_eleicao, DS_CARGO, genero, genero_doador,
            cor_raca, cor_raca_doador)) %>%
  group_by(race_donor, race_accountant) %>%
  summarise(transfers_between_accountants = sum(TOTAL_RECEITA),
            .groups = 'drop') %>%
  ungroup() %>%
  group_by(race_donor) %>%
  mutate(
    prc_transfers_rel_race_donor = transfers_between_accountants / 
        sum(transfers_between_accountants)) %>%
  ungroup() %>%
  mutate(
      prc_transfers_rel_total = transfers_between_accountants / 
        sum(transfers_between_accountants)) %>%
  select(race_donor, race_accountant, 
         transfers_between_accountants,
         prc_transfers_rel_total,
         prc_transfers_rel_race_donor) %>%
  as_tibble()

d_t_2
```

Plot table 2 (HTML format)

```{r}
#| warning: false
#| message: false
#| label: plot-table-2

t_2 <- as_flextable(d_t_2) |>
  delete_part("foot") |>
  delete_rows(i = 2, part = "header") |>
  mk_par(
    j = 4, part = "body",
    value = as_paragraph(as_chunk(prc_transfers_rel_total,
                                  formatter = fmt_pct))) |>
    mk_par(
    j = 5, part = "body",
    value = as_paragraph(as_chunk(prc_transfers_rel_race_donor,
                                  formatter = fmt_pct))) |>
  labelizor(
    part = "header",
    labels = c(
      "race_accountant" = "Accountant race/color",
      "race_donor" = "Donor race/color",
      "transfers_between_accountants" = "Transfer amount (BRL)",
      "prc_transfers_rel_total" = "% (rel. total)",
      "prc_transfers_rel_race_donor" = "% (rel. donor race/color)")) |>
  align(align = "center", part = "all") |>
  width(width = 1.25) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_2
```

Save table 2 (XLSX and DOCX)

```{r}
#| warning: false
#| message: false
#| label: save-table-2

# XLSX
write.xlsx(d_t_2, 
           '..//Tabelas//Paper_Table_2_Dataset.xlsx')

# DOCX
path <- '..//Tabelas/Paper_Table_2_R.docx'
doc <- read_docx()
doc <- body_add_flextable(doc, value = t_2)
print(doc, target = path)
```

## Transfers and Reelection

::: callout-note
In this work, we consider only funding originating from public revenues.

Only the transformed variables presented in the tables below were translated; the others were kept in Portuguese.
:::

### Create dataset for Tables 3 and 4

```{r}
#| warning: false
#| message: false
#| label: create-data-tables-3-4

d_tables_3_4 <- receitas_publicas %>% 
  filter(
    cor_raca %in% c("White", "Black"),
    genero %in% c("Men", "Women"),
    cargo %in% c('DEPUTADO FEDERAL',
                 'DEPUTADO DISTRITAL',
                 'DEPUTADO ESTADUAL')
  ) %>%
  mutate(
    year_election = ano_eleicao,
    race_color = cor_raca,
    gender = genero,
    position = ifelse(cargo == "DEPUTADO DISTRITAL", 
                      "DEPUTADO ESTADUAL", 
                      cargo),
    dispute_status = ifelse(reeleicao == "Reelection",
                            "Incumbent",
                            reeleicao)
  ) %>%
  select(-c(ano_eleicao, cor_raca, genero, cargo, reeleicao)) %>%
  group_by(position, year_election, gender, race_color, dispute_status) %>%
  summarise(
    total_receita_grupo = sum(rec_cand, na.rm = TRUE),
    qtd_cand_grupo = n_distinct(cpf_candidato),
    .groups = "drop") %>%
  ungroup() %>%
  group_by(position, year_election) %>%
  mutate(
    prc_rvn = total_receita_grupo / sum(total_receita_grupo, na.rm = TRUE),
    prc_cnd = qtd_cand_grupo / sum(qtd_cand_grupo, na.rm = TRUE),
    avg_rvn = total_receita_grupo /  qtd_cand_grupo) %>%
  select(
    position, year_election, gender, race_color, dispute_status, 
    prc_cnd, prc_rvn, avg_rvn)

head(d_tables_3_4)
```

### Table 3 - Proportion and average funds of candidates for federal deputy by gender, race and dispute status (challenger x incumbent)

Plot Table 3 - panel "a": Federal Deputy - 2018

```{r}
#| warning: false
#| message: false
#| label: plot-table-3-a

v_year_election = 2018
t_3_a <- d_tables_3_4 %>%
  filter(year_election == v_year_election,
         position == "DEPUTADO FEDERAL") %>%
  tabulator(
    rows = c("gender", "race_color"),
    columns = "dispute_status",
    `Candidate (%)` = as_paragraph(as_chunk(prc_cnd, formatter = fmt_pct)),
    `Revenue (%)` = as_paragraph(as_chunk(prc_rvn, formatter = fmt_pct)),
    `Revenue avg. (BRL)` = as_paragraph(as_chunk(avg_rvn,
                                                 formatter = fmt_int)))

t_3_a <- as_flextable(t_3_a) %>%
  labelizor(
    part = "header",
    labels = c(
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 10,
    top = FALSE) |>
  align(align = "center", part = "all") |>
  width(width = 0.8) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_3_a
```

Format Table 3 - panel "b": Federal Deputy - 2022

```{r}
#| label: plot-table-3-b
#| message: false
#| warning: false

v_year_election = 2022
t_3_b <- d_tables_3_4 %>%
  filter(year_election == v_year_election,
         position == "DEPUTADO FEDERAL") %>%
  tabulator(
    rows = c("gender", "race_color"),
    columns = "dispute_status",
    `Candidate (%)` = as_paragraph(as_chunk(prc_cnd, formatter = fmt_pct)),
    `Revenue (%)` = as_paragraph(as_chunk(prc_rvn, formatter = fmt_pct)),
    `Revenue avg. (BRL)` = as_paragraph(as_chunk(avg_rvn,
                                                 formatter = fmt_int)))

t_3_b <- as_flextable(t_3_b) %>%
  labelizor(
    part = "header",
    labels = c(
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 10,
    top = FALSE) |>
  align(align = "center", part = "all") |>
  width(width = 0.8) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_3_b
```

Save table 3, panels "a" and "b", in DOCX format

```{r}
#| warning: false
#| message: false
#| label: save-table-3

path <- '..//Tabelas/Paper_Table_3_R.docx'

doc <- read_docx()
doc <- body_add_flextable(doc, value=t_3_a)
doc <- body_add_flextable(doc, value=t_3_b)
print(doc, target = path)
```

### Table 4 - Proportion and average funds of candidates for state deputy by gender, race and dispute status (challenger x incumbent)

Plot Table 4 - panel "a": State Deputy - 2018

```{r}
#| warning: false
#| message: false
#| label: plot-table-4-a

v_year_election = 2018
t_4_a <- d_tables_3_4 %>%
  filter(year_election == v_year_election,
         position == "DEPUTADO ESTADUAL") %>%
  tabulator(
    rows = c("gender", "race_color"),
    columns = "dispute_status",
    `Candidate (%)` = as_paragraph(as_chunk(prc_cnd, formatter = fmt_pct)),
    `Revenue (%)` = as_paragraph(as_chunk(prc_rvn, formatter = fmt_pct)),
    `Revenue avg. (BRL)` = as_paragraph(as_chunk(avg_rvn,
                                                 formatter = fmt_int)))

t_4_a <- as_flextable(t_4_a) %>%
  labelizor(
    part = "header",
    labels = c(
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 10,
    top = FALSE) |>
  align(align = "center", part = "all") |>
  width(width = 0.8) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_4_a
```

Plot Table 4 - panel "a": State Deputy - 2022

```{r}
#| warning: false
#| message: false
#| label: plot-table-4-b

v_year_election = 2022
t_4_b <- d_tables_3_4 %>%
  filter(year_election == v_year_election,
         position == "DEPUTADO ESTADUAL") %>%
  tabulator(
    rows = c("gender", "race_color"),
    columns = "dispute_status",
    `Candidate (%)` = as_paragraph(as_chunk(prc_cnd, formatter = fmt_pct)),
    `Revenue (%)` = as_paragraph(as_chunk(prc_rvn, formatter = fmt_pct)),
    `Revenue avg. (BRL)` = as_paragraph(as_chunk(avg_rvn,
                                                 formatter = fmt_int)))

t_4_b <- as_flextable(t_4_b) %>%
  labelizor(
    part = "header",
    labels = c(
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 10,
    top = FALSE) |>
  align(align = "center", part = "all") |>
  width(width = 0.8) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_4_b
```

Save table 4, panels "a" and "b", in DOCX format

```{r}
#| warning: false
#| message: false
#| label: save-table-4

path = '..//Tabelas/Paper_Table_4_R.docx'

doc <- read_docx()
doc <- body_add_flextable(doc, value = t_4_a)
doc <- body_add_flextable(doc, value = t_4_b)
print(doc, target = path)
```

## Appendix A - Table 1 - Public resource transfers between candidate providers, by gender and race

Create dataset for Table A1

```{r}
d_table_a1 <- receitas_publicas_especie_agr %>%
  filter(genero %in% c('Men', 'Women'),
         cor_raca %in% c('White', 'Black')) %>%
  mutate(
    year_election = ano_eleicao,
    race_color = cor_raca,
    gender = genero,
    position = case_when(
      cargo %in% c("DEPUTADO DISTRITAL", 
                   "DEPUTADO ESTADUAL") ~ 'State Deputy',
      cargo == 'DEPUTADO FEDERAL' ~ 'Federal Deputy'),
    rvn_kind = especie_receita) %>%
  select(-c(ano_eleicao, cor_raca, genero, cargo, especie_receita)) %>%
  group_by(year_election, position, gender, race_color,
           rvn_kind) %>%
  summarise(
    rvn_amount = sum(total_rec_ano_cargo_genero_cor_especie, na.rm = TRUE),
    .groups = "drop")

head(d_table_a1)
```

Plot Table A1

```{r}
#| warning: false
#| message: false
#| label: plot-table-a-1

v_year_election = 2018
t_a1_a <- d_table_a1 |>
  filter(year_election == v_year_election) |>
  tabulator(
    rows = c("position", "race_color"),
    columns = c("gender", "rvn_kind"),
    `Revenue amount (BRL)` = as_paragraph(
      as_chunk(rvn_amount, formatter = fmt_int)))

t_a1_a <- as_flextable(t_a1_a) |>
  labelizor(
    part = "header",
    labels = c(
      "position" = "Position",
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 10,
    top = FALSE) |>
  align(align = "center", part = "all") |>
  width(width = 0.8) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_a1_a
```

```{r}
#| warning: false
#| message: false
#| label: plot-table-a-1-b

v_year_election = 2022
t_a1_b <- d_table_a1 |>
  filter(year_election == v_year_election) |>
  tabulator(
    rows = c("position", "race_color"),
    columns = c("gender", "rvn_kind"),
    `Revenue amount (BRL)` = as_paragraph(
      as_chunk(rvn_amount, formatter = fmt_int)))

t_a1_b <- as_flextable(t_a1_b) |>
  labelizor(
    part = "header",
    labels = c(
      "position" = "Position",
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 10,
    top = FALSE) |>
  align(align = "center", part = "all") |>
  width(width = 0.8) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_a1_b
```

Save table A-1

```{r}
#| warning: false
#| message: false
#| label: save-table-a-1

path = '..//Tabelas/Paper_Table_A-1_R.docx'

doc <- read_docx()
doc <- body_add_flextable(doc, value = t_a1_a)
doc <- body_add_flextable(doc, value = t_a1_b)
print(doc, target = path)
```

## Appendix B

### Table B-1 - Average public resource of federal deputies by week of spending, gender and race

Create datasets for table B-1 (Federal Deputy)

```{r}
#| warning: false
#| message: false
#| label: create-data-table-b-1

d_table_b1 <- g3_data %>%
  mutate(
    year_election = ano_eleicao,
    week_revenue = semana_receita,
    race_color = cor_raca,
    gender = genero,
    position = cargo) %>%
  group_by(year_election, position, week_revenue, 
           gender, race_color) %>%
  summarise(avg_public_revenue = mean(media_receitas_genero_cor)) %>%
  mutate(avg_public_revenue = round(avg_public_revenue, 2))

head(d_table_b1)
```

### 

Plot table B1 - panel "a" (Federal Deputy - 2018)

```{r}
#| warning: false
#| message: false
#| label: plot-table-b1-a

v_year_election = 2018
min_week_revenue <- '2018-08-18'
max_week_revenue <- '2018-09-30'
t_b1_a <- d_table_b1 |>
  filter(year_election == v_year_election,
         week_revenue >= min_week_revenue,
         week_revenue <= max_week_revenue) |>
  tabulator(
    rows = c("week_revenue", "race_color"),
    columns = c("gender"),
    `Revenue avg. (BRL)` = as_paragraph(
      as_chunk(avg_public_revenue, formatter = fmt_dbl)))

t_b1_a <- as_flextable(t_b1_a) |>
  labelizor(
    part = "header",
    labels = c(
      "week_revenue" = "Revenue registration week ",
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 6,
    top = TRUE) |>
  align(align = "center", part = "all") |>
  width(width = 0.9) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_b1_a
```

```{r}
#| warning: false
#| message: false
#| label: plot-table-b1-b

v_year_election = 2022
min_week_revenue <- '2022-08-21'
max_week_revenue <- '2022-09-25'
t_b1_b <- d_table_b1 |>
  filter(year_election == v_year_election,
         week_revenue >= min_week_revenue,
         week_revenue <= max_week_revenue) |>
  tabulator(
    rows = c("week_revenue", "race_color"),
    columns = c("gender"),
    `Revenue avg. (BRL)` = as_paragraph(
      as_chunk(avg_public_revenue, formatter = fmt_dbl)))

t_b1_b <- as_flextable(t_b1_b) |>
  labelizor(
    part = "header",
    labels = c(
      "week_revenue" = "Revenue registration week ",
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 6,
    top = TRUE) |>
  align(align = "center", part = "all") |>
  width(width = 0.9) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_b1_b
```

Save table B-1

```{r}
#| warning: false
#| message: false
#| label: save-table-b-1

path = '..//Tabelas/Paper_Table_B-1_R.docx'

doc <- read_docx()
doc <- body_add_flextable(doc, value = t_b1_a)
doc <- body_add_flextable(doc, value = t_b1_b)
print(doc, target = path)
```

### Table B-2 - Average public resource of state deputy by revenue registration week, gender and race

Create datasets for table B-2 (State Deputy)

```{r}
#| warning: false
#| message: false
#| label: create-data-table-b-2

d_table_b2 <- g4_data %>%
  mutate(
    year_election = ano_eleicao,
    week_revenue = semana_receita,
    race_color = cor_raca,
    gender = genero,
    position = ifelse(
      cargo == "DEPUTADO DISTRITAL" | cargo == "DEPUTADO ESTADUAL", 
      "State Deputy", 
      cargo)) %>%
  group_by(year_election, position, week_revenue, 
           gender, race_color) %>%
  summarise(avg_public_revenue = mean(media_receitas_genero_cor)) %>%
  mutate(avg_public_revenue = round(avg_public_revenue, 2))

head(d_table_b2)
```

Plot table B2 - panel "a" (State Deputy - 2018)

```{r}
#| warning: false
#| message: false
#| label: plot-table-b2-a

v_year_election = 2018
min_week_revenue <- '2018-08-18'
max_week_revenue <- '2018-09-30'
t_b2_a <- d_table_b2 |>
  filter(year_election == v_year_election,
         week_revenue >= min_week_revenue,
         week_revenue <= max_week_revenue) |>
  tabulator(
    rows = c("week_revenue", "race_color"),
    columns = c("gender"),
    `Revenue avg. (BRL)` = as_paragraph(
      as_chunk(avg_public_revenue, formatter = fmt_dbl)))

t_b2_a <- as_flextable(t_b2_a) |>
  labelizor(
    part = "header",
    labels = c(
      "week_revenue" = "Revenue registration week",
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 6,
    top = TRUE) |>
  align(align = "center", part = "all") |>
  width(width = 0.9) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_b2_a
```

Plot table B2 - panel "b" (State Deputy - 2022)

```{r}
#| warning: false
#| message: false
#| label: plot-table-b2-b

v_year_election = 2022
min_week_revenue <- '2022-08-21'
max_week_revenue <- '2022-09-25'
t_b2_b <- d_table_b2 |>
  filter(year_election == v_year_election,
         week_revenue >= min_week_revenue,
         week_revenue <= max_week_revenue) |>
  tabulator(
    rows = c("week_revenue", "race_color"),
    columns = c("gender"),
    `Revenue avg. (BRL)` = as_paragraph(
      as_chunk(avg_public_revenue, formatter = fmt_dbl)))

t_b2_b <- as_flextable(t_b2_b) |>
  labelizor(
    part = "header",
    labels = c(
      "week_revenue" = "Revenue registration week ",
      "gender" = "Gender",
      "race_color" = "Race/color")) |>
  add_header_row(
    values = as_paragraph(
      fmt_signif_after_zeros(v_year_election, digits = 0)),
    colwidths = 6,
    top = TRUE) |>
  align(align = "center", part = "all") |>
  width(width = 0.9) |>
  set_table_properties(layout = "fixed") |>
  flextable::style(
    pr_t = properties1,
    part = "all") |>
  flextable::style(
    pr_t = properties2,
    part = "header")

t_b2_b
```

Save table B-2

```{r}
#| warning: false
#| message: false
#| label: save-table-b-2

path = '..//Tabelas/Paper_Table_B-2_R.docx'

doc <- read_docx()
doc <- body_add_flextable(doc, value = t_b2_a)
doc <- body_add_flextable(doc, value = t_b2_b)
print(doc, target = path)
```

### 
